Rationale and Research Questions

The transition to renewable energy is a critical issue for mitigating climate change and promoting sustainable development. As the world’s second-largest carbon emitter (https://www.epa.gov/ghgemissions/global-greenhouse-gas-emissions-data), the United States has set ambitious targets for reducing greenhouse gas emissions and increasing renewable energy generation. However, the factors that drive renewable energy adoption in the US are not fully understood, and there is a need for more research to identify the most effective policies and incentives for promoting clean energy.

In this project, we aim to investigate the key drivers of renewable energy adoption in the US and their relative importance. We will use a comprehensive dataset derived from public-facing websites which includes information on state-level renewable energy capacity, energy policies, economic indicators, and demographic variables.

Our research questions include: 1. How do economic indicators, such as energy prices, GDP, and oil production, influence renewable energy adoption in different regions of the US? 2. What role do demographic variables, such as IQ and political affiliation, play in shaping attitudes towards renewable energy and driving adoption?

Project Setup

knitr::opts_chunk$set(echo = TRUE)
#1
#Install familiar packages
library(tidyverse);library(lubridate);library(viridis);library(here)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.0 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## Loading required package: timechange
## 
## 
## Attaching package: 'lubridate'
## 
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
## 
## 
## Loading required package: viridisLite
## 
## here() starts at /home/guest/R/LiuSmootZhang_ENV872_EDA_FinalProject
#install.packages("rvest")
library(rvest)
## 
## Attaching package: 'rvest'
## 
## The following object is masked from 'package:readr':
## 
##     guess_encoding
#install.packages("dataRetrieval")
library(dataRetrieval)

#install.packages("tidycensus")
library(tidycensus)
library(dplyr)
library(rvest)
library(ggplot2)

#install.packages("corrplot")
library(corrplot)
## corrplot 0.92 loaded
library(stringr)

library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
library(leaflet)
library(mapview); mapviewOptions(fgb = FALSE)
sf::sf_use_s2(FALSE)
## Spherical geometry (s2) switched off

Dataset information

The dataset for this analysis was collected from various sources including the National Renewable Energy Laboratory (NREL), the United States Energy Information Administration (EIA), and wisevoter.com. The dataset includes information on various factors that may drive renewable energy adoption in the US, such as state-level policies, economic indicators, demographic characteristics, and natural resources.

The dataset was first imported into R as separate data frames and then merged into a single data frame using the merge() function. Before merging, data cleaning and manipulation were performed to ensure consistency across variables and eliminate missing values. For instance, some variables were renamed, and missing values were replaced with appropriate values or removed entirely.

The final dataset contains 50 observations (one for each state) and 7 variables. The variables in the dataset are summarized in the table below: | Variable | Unit | Range / Central Tendency | Data Source | |———————————|———–|————————–|————————————-| | Good_Air_Quality_Days_Percentage| Percentage| 35.8-89.4 | wisevoter.com | | Oil_Production | Thousand barrels per day | 0-4700| wisevoter.com | | GDP | Billion dollars | 54.7-2,784.1 | wisevoter.com | | Sunshine | Hours per day | 3.7-10.3 | wisevoter.com | | Electricity_Cost | Cents per kilowatt hour | 5.6-20.5 | wisevoter.com | | Average_IQ | IQ points | 94.8-103.9 | wisevoter.com | | Republican_vote | Percentage| 23.5-70.4 | wisevoter.com |

Data Scraping

url_aq <- "https://wisevoter.com/state-rankings/air-quality-by-state/"
page1 <- read_html(url_aq)

states <- page1 %>% 
  html_nodes("td:nth-child(2)") %>% 
  html_text()

good.air.days <- page1 %>% 
  html_nodes("td:nth-child(3)") %>% 
  html_text()

df_aq <- data.frame(
  States = states,
  Good_Air_Quality_Days_Percentage = good.air.days
)
url_oil <- "https://wisevoter.com/state-rankings/oil-production-by-state/"
page2 <- read_html(url_oil)

states2 <- page2 %>% 
  html_nodes("td:nth-child(2)") %>% 
  html_text()

oil.production <- page2 %>% 
  html_nodes("td:nth-child(3)") %>% 
  html_text()

df_oil <- data.frame(
  States = states2,
  Oil_Production = oil.production
)
url_gdp <- "https://wisevoter.com/state-rankings/gdp-by-state/"
page <- read_html(url_gdp)

states3 <- page %>% 
  html_nodes("td:nth-child(2)") %>% 
  html_text()

gdp <- page %>% 
  html_nodes("td:nth-child(3)") %>% 
  html_text()

df_gdp <- data.frame(
  States = states3,
  GDP = gdp
)
url_sunny <- "https://wisevoter.com/state-rankings/sunniest-states/"
page <- read_html(url_sunny)

states4 <- page %>% 
  html_nodes("td:nth-child(2)") %>% 
  html_text()

sunshine <- page %>% 
  html_nodes("td:nth-child(3)") %>% 
  html_text()

df_sunny <- data.frame(
  States = states4,
  Sunshine = sunshine
)
url_elec <- "https://wisevoter.com/state-rankings/electricity-cost-by-state/"
page <- read_html(url_elec)

states5 <- page %>% 
  html_nodes("td:nth-child(2)") %>% 
  html_text()

cost <- page %>% 
  html_nodes("td:nth-child(3)") %>% 
  html_text()

df_ecost <- data.frame(
  States = states5,
  Electricity_Cost = cost
)
url_iq <- "https://wisevoter.com/state-rankings/average-iq-by-state/"
page <- read_html(url_iq)

states6 <- page %>% 
  html_nodes("td:nth-child(2)") %>% 
  html_text()

iq <- page %>% 
  html_nodes("td:nth-child(3)") %>% 
  html_text()

df_iq <- data.frame(
  States = states6,
  Average_IQ = iq
)
url_party <- "https://wisevoter.com/state-rankings/red-and-blue-states"
page <- read_html(url_party)

states7 <- page %>% 
  html_nodes("td:nth-child(1)") %>% 
  html_text()

rep_vote <- page %>% 
  html_nodes("td:nth-child(3)") %>% 
  html_text()

df_party <- data.frame(
  States = states7,
  Republican_vote = rep_vote
)
# Merge data frames
state_rankings_df_merge <- Reduce(function(x, y) merge(x, y, by = "States", all = TRUE), 
                   list(df_aq, df_oil, df_gdp, df_sunny, df_ecost, df_iq, df_party))

# View merged data frame
state_rankings_df_merge
##                  States Good_Air_Quality_Days_Percentage Oil_Production
## 1               Alabama                            85.8%       425 Mbbl
## 2                Alaska                            92.3%   159,623 Mbbl
## 3               Arizona                            64.6%         6 Mbbl
## 4              Arkansas                              83%     4,179 Mbbl
## 5            California                            60.6%   130,593 Mbbl
## 6              Colorado                            75.4%   144,621 Mbbl
## 7           Connecticut                            83.4%           <NA>
## 8              Delaware                            80.3%           <NA>
## 9  District of Columbia                              69%           <NA>
## 10              Florida                            89.8%     1,492 Mbbl
## 11              Georgia                            82.1%           <NA>
## 12               Hawaii                            99.4%           <NA>
## 13                Idaho                              79%        25 Mbbl
## 14             Illinois                              75%     7,054 Mbbl
## 15              Indiana                            80.9%     1,523 Mbbl
## 16                 Iowa                            78.3%           <NA>
## 17               Kansas                            80.8%    27,518 Mbbl
## 18             Kentucky                            85.2%     2,464 Mbbl
## 19            Louisiana                            87.9%    32,843 Mbbl
## 20                Maine                            93.2%           <NA>
## 21             Maryland                            84.2%           <NA>
## 22        Massachusetts                            87.1%           <NA>
## 23             Michigan                            79.9%     4,454 Mbbl
## 24            Minnesota                            83.4%           <NA>
## 25          Mississippi                            85.5%     1,343 Mbbl
## 26             Missouri                            86.1%        50 Mbbl
## 27              Montana                            80.8%    18,731 Mbbl
## 28             Nebraska                            86.8%     1,593 Mbbl
## 29               Nevada                            72.5%       210 Mbbl
## 30        New Hampshire                              91%           <NA>
## 31           New Jersey                            82.3%           <NA>
## 32           New Mexico                            78.9%   459,792 Mbbl
## 33             New York                            88.7%       178 Mbbl
## 34       North Carolina                            86.3%           <NA>
## 35         North Dakota                            81.1%   393,763 Mbbl
## 36                 Ohio                            80.9%    18,147 Mbbl
## 37             Oklahoma                            76.7%   143,052 Mbbl
## 38               Oregon                              83%           <NA>
## 39         Pennsylvania                            79.2%     6,357 Mbbl
## 40         Rhode Island                            86.7%           <NA>
## 41       South Carolina                            84.9%           <NA>
## 42         South Dakota                            82.1%     1,028 Mbbl
## 43            Tennessee                            84.5%       206 Mbbl
## 44                Texas                            80.5% 1,741,444 Mbbl
## 45                 Utah                            70.2%    36,317 Mbbl
## 46              Vermont                            91.3%           <NA>
## 47             Virginia                            91.5%         3 Mbbl
## 48           Washington                            88.7%           <NA>
## 49        West Virginia                            86.7%    18,918 Mbbl
## 50            Wisconsin                            79.8%           <NA>
## 51              Wyoming                            78.9%    84,753 Mbbl
##                   GDP    Sunshine Electricity_Cost Average_IQ Republican_vote
## 1    $257,465,000,000 4,660 kJ/m²    $0.1  per kWh       95.7             62%
## 2     $57,983,000,000        <NA>    $0.2  per kWh         99           52.8%
## 3    $429,819,000,000 5,755 kJ/m²   $0.11  per kWh       97.4           49.1%
## 4    $150,483,000,000 4,724 kJ/m²   $0.09  per kWh       97.5           62.4%
## 5  $3,513,348,000,000 5,050 kJ/m²    $0.2  per kWh       95.5           34.3%
## 6    $440,903,000,000 4,960 kJ/m²   $0.11  per kWh      101.6           41.9%
## 7    $308,671,000,000 3,988 kJ/m²   $0.18  per kWh      103.1           39.2%
## 8     $84,206,000,000 4,232 kJ/m²   $0.11  per kWh      100.4           39.8%
## 9    $156,457,000,000 4,307 kJ/m²   $0.13  per kWh       <NA>            5.4%
## 10 $1,286,086,000,000 4,859 kJ/m²   $0.11  per kWh       98.4           51.2%
## 11   $713,948,000,000 4,661 kJ/m²    $0.1  per kWh         98           49.2%
## 12    $94,937,000,000        <NA>    $0.3  per kWh       95.6           34.3%
## 13    $98,455,000,000 4,251 kJ/m²   $0.08  per kWh      101.4           63.8%
## 14   $973,486,000,000 4,380 kJ/m²    $0.1  per kWh       99.9           40.6%
## 15   $438,012,000,000 4,318 kJ/m²    $0.1  per kWh      101.7             57%
## 16   $225,696,000,000 4,331 kJ/m²   $0.09  per kWh      103.2           53.1%
## 17   $198,291,000,000 4,890 kJ/m²   $0.11  per kWh      102.8           56.2%
## 18   $244,480,000,000 4,383 kJ/m²   $0.09  per kWh       99.4           62.1%
## 19   $267,126,000,000 4,725 kJ/m²   $0.09  per kWh       95.3           58.5%
## 20    $79,177,000,000 3,815 kJ/m²   $0.14  per kWh      103.4             44%
## 21   $451,986,000,000 4,267 kJ/m²   $0.12  per kWh       99.7           32.2%
## 22   $663,750,000,000 3,944 kJ/m²   $0.19  per kWh      104.3           32.1%
## 23   $592,349,000,000 4,018 kJ/m²   $0.13  per kWh      100.5           47.8%
## 24   $429,391,000,000 3,968 kJ/m²   $0.11  per kWh      103.7           45.3%
## 25   $129,974,000,000 4,693 kJ/m²    $0.1  per kWh       94.2           57.6%
## 26   $373,105,000,000 4,545 kJ/m²    $0.1  per kWh        101           56.8%
## 27    $61,983,000,000 3,847 kJ/m²    $0.1  per kWh      103.4           56.9%
## 28   $154,149,000,000 4,685 kJ/m²   $0.09  per kWh      102.3           58.2%
## 29   $204,306,000,000 5,296 kJ/m²   $0.09  per kWh       96.5           47.7%
## 30   $102,439,000,000 3,891 kJ/m²   $0.17  per kWh      104.2           45.4%
## 31   $700,119,000,000 4,056 kJ/m²   $0.14  per kWh      102.8           41.4%
## 32   $114,680,000,000 5,642 kJ/m²    $0.1  per kWh       95.7           43.5%
## 33 $1,914,207,000,000 3,904 kJ/m²   $0.16  per kWh      100.7           37.8%
## 34   $684,607,000,000 4,466 kJ/m²   $0.09  per kWh      100.2           49.9%
## 35    $66,372,000,000 3,925 kJ/m²   $0.09  per kWh      103.8           65.1%
## 36   $765,003,000,000 4,139 kJ/m²    $0.1  per kWh      101.8           53.3%
## 37   $218,564,000,000 4,912 kJ/m²   $0.09  per kWh       99.3           65.4%
## 38   $279,424,000,000 3,830 kJ/m²   $0.09  per kWh      101.2           40.4%
## 39   $874,881,000,000 3,939 kJ/m²    $0.1  per kWh      101.5           48.8%
## 40    $68,823,000,000 3,989 kJ/m²   $0.18  per kWh       99.5           38.6%
## 41   $281,754,000,000 4,624 kJ/m²    $0.1  per kWh       98.4           55.1%
## 42    $62,809,000,000 4,332 kJ/m²    $0.1  per kWh      102.8           61.8%
## 43   $439,050,000,000 4,486 kJ/m²    $0.1  per kWh       97.7           60.7%
## 44 $2,104,579,000,000 5,137 kJ/m²   $0.09  per kWh        100           52.1%
## 45   $230,342,000,000 4,887 kJ/m²   $0.08  per kWh      101.1           58.1%
## 46    $37,644,000,000 3,826 kJ/m²   $0.16  per kWh      103.8           30.7%
## 47   $614,754,000,000 4,354 kJ/m²   $0.09  per kWh      101.9             44%
## 48   $696,748,000,000 3,467 kJ/m²   $0.09  per kWh      101.9           38.8%
## 49    $91,953,000,000 4,146 kJ/m²   $0.09  per kWh       98.7           68.6%
## 50   $379,909,000,000 4,023 kJ/m²   $0.11  per kWh      102.9           48.8%
## 51    $44,323,000,000 4,471 kJ/m²   $0.08  per kWh      102.4           69.9%
write.csv(state_rankings_df_merge, file = "state rankings_raw.csv",
          row.names = FALSE)

Data Cleaning

Next, we clean the data scraped from various websites before merging it with the power plant data. The dataframe we have right now consisted of columns made of chars. After removing the units (such as per kWh) and characters (such as comma and $), we converted the chars to numeric values and created additional columns as needed based on existing columns.

# read in raw data
state_ranking <- read.csv('state rankings_raw.csv', header = TRUE)
# save a copy of raw data before making changes
state_ranking_raw <- state_ranking

# remove % in good_air_q col
state_ranking$Good_Air_Quality_Days_Percentage <- str_sub(state_ranking$Good_Air_Quality_Days_Percentage, end = -2)
# change chr to numeric
state_ranking$Good_Air_Quality_Days_Percentage <- as.numeric(state_ranking$Good_Air_Quality_Days_Percentage)

# remove unit in oil_production
state_ranking$Oil_Production <- str_sub(state_ranking$Oil_Production, end = -5)
# remove comma 
state_ranking$Oil_Production <- gsub(',','', state_ranking$Oil_Production)
# change chr to numeric
state_ranking$Oil_Production <- as.numeric(state_ranking$Oil_Production)

# remove $ in GDP
state_ranking$GDP <- str_sub(state_ranking$GDP, 2)
# remove comma
state_ranking$GDP <- gsub(',','', state_ranking$GDP)
# change to numeric
state_ranking$GDP <- as.numeric(state_ranking$GDP)

# remove unit in sunshine
state_ranking$Sunshine <- str_sub(state_ranking$Sunshine, end=-6)
# remove comma
state_ranking$Sunshine <- gsub(',','', state_ranking$Sunshine)
# change to numeric
state_ranking$Sunshine <- as.numeric(state_ranking$Sunshine)

# clean electricity_cost
state_ranking$Electricity_Cost <- str_sub(state_ranking$Electricity_Cost, end=-8)
state_ranking$Electricity_Cost <- str_sub(state_ranking$Electricity_Cost, 2)
state_ranking$Electricity_Cost <- as.numeric(state_ranking$Electricity_Cost)

# clean average
state_ranking$Average_IQ <- as.numeric(state_ranking$Average_IQ)

# clean republican_vote
state_ranking$Republican_vote <- str_sub(state_ranking$Republican_vote, end=-2)
state_ranking$Republican_vote <- as.numeric(state_ranking$Republican_vote)

# create new column: party affiliation
state_ranking$party <- with(state_ranking, ifelse(Republican_vote > 50, "Red", "Blue"))

For the response variable, clean energy adoption rate, data wrangling and calculation are needed. We first transform all NAs to 0 so that we can later aggregate the capacity for each energy source. The next step is to calculate the total clean energy capacity and total energy generation capacity for each power plant. After that, we aggregate all the power plants by states and calculate the clean energy percentage for each state.

Power_plant <- read.csv(here("data_raw/Power_Plants.csv"), stringsAsFactors = TRUE)

Power_plant[is.na(Power_plant)] <- 0
## Warning in `[<-.factor`(`*tmp*`, thisvar, value = 0): invalid factor level, NA
## generated

## Warning in `[<-.factor`(`*tmp*`, thisvar, value = 0): invalid factor level, NA
## generated
Power_plant <- Power_plant %>%
  mutate(Total_clean = Bio_MW + Geo_MW + Hydro_MW + HydroPS_MW + Nuclear_MW + Solar_MW + Wind_MW) %>%
  mutate(Total = Total_clean + Coal_MW + NG_MW + Crude_MW)

Clean_energy <- aggregate(cbind(Power_plant$Total_clean, Power_plant$Total), list(Power_plant$State), FUN = sum)
colnames(Clean_energy) <- c("States", "Clean_energy_MW", "Total_MW")
Clean_energy$Clean_percent <- Clean_energy$Clean_energy_MW/Clean_energy$Total_MW

We merge the two clean datasets as one final dataset that is ready for analysis.

final_df <- merge(Clean_energy, state_ranking)

Exploratory Analysis

Correlation plots

We plot correlation plots to explore the covariance among response and explanatory variables.

final_naomit <- final_df %>%
  select(Clean_percent:Republican_vote) %>%
  na.omit()
final_corr <- cor(final_naomit)
corrplot.mixed(final_corr, upper = "ellipse", tl.cex = 0.6)

Histogram

We want to take a deeper dive into the distribution of each variables by histograms. Based on the histograms, oil production, GDP and electricity cost are skewed, which indicates that transformation is needed for those variables.

par(mfrow=c(2,4))
hist(final_naomit$Clean_percent, xlab = "Clean Energy Adoption %", main = NULL)
hist(final_naomit$Good_Air_Quality_Days_Percentage, xlab = "Good Air Quality Days %", main = NULL)
hist(final_naomit$Oil_Production, xlab = "Good Air Quality Days %", main = NULL)
hist(final_naomit$GDP, xlab = "GDP", main = NULL)
hist(final_naomit$Sunshine, xlab = "Sunshine", main = NULL)
hist(final_naomit$Electricity_Cost, xlab = "Electricity Cost", main = NULL)
hist(final_naomit$Average_IQ, xlab = "Average IQ", main = NULL)
hist(final_naomit$Republican_vote, xlab = "Republican Vote", main = NULL)

Map

Next, we visualize our power plants data by making an interactive map widget with the leaflet package. We first categorized the power plant as “clean energy” or “other” based on the “primary source” attribute - we considered biomass, geothermal, hydroelectric, pumped storage, nuclear, solar, and wind to be clean. For the map, we plotted clean energy power plants as blue and other as red; the size of the markers was scaled to indicate the maximum output and was expressed in megawatts in original data. You can pan around and zoom in and out to explore the dataset.

# read in the shapefile
powerp <- st_read('../LiuSmootZhang_ENV872_EDA_FinalProject/data_raw/Power_Plants.shp')
## Reading layer `Power_Plants' from data source 
##   `/home/guest/R/LiuSmootZhang_ENV872_EDA_FinalProject/data_raw/Power_Plants.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 11569 features and 32 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -171.7124 ymin: 17.94712 xmax: -65.27956 ymax: 71.292
## Geodetic CRS:  WGS 84
powerp$PrimSource <- as.factor(powerp$PrimSource)

unique(powerp$PrimSource)
##  [1] petroleum      hydroelectric  natural gas    nuclear        coal          
##  [6] pumped storage geothermal     wind           biomass        batteries     
## [11] other          solar         
## 12 Levels: batteries biomass coal geothermal hydroelectric ... wind
powerp$PrimSource <- str_trim(powerp$PrimSource, 'both')
# clean energy: biomass, geothermal, hydroelectric, pumped, nuclear, solar, wind
# create new column: clean energy
powerp$clean <- with(powerp,
                     ifelse(PrimSource=='biomass'|PrimSource=='geothermal'|PrimSource=='hydroelectric'|PrimSource=='pumpedstorage'|PrimSource=='nuclear'|PrimSource=='solar'|PrimSource=='wind', 'clean energy', 'others'))

powerp$clean <- as.factor(powerp$clean)

# colors for the map widget
pal <- colorFactor(c('skyblue','tomato'), powerp$clean)
# make the map widget
leaflet(powerp) %>% 
  addProviderTiles(providers$CartoDB.Positron)%>%  
  addCircleMarkers(
    radius = ~sqrt(Total_MW)/5,
    color = ~pal(clean),
    stroke = FALSE, fillOpacity = 0.5
  ) %>% 
  addLegend("bottomright", pal = pal, values = ~clean,
    title = "Power plants",
    opacity = 1
  )

We then mapped the clean energy percentage in each state. Thanks to the help from https://rstudio.github.io/leaflet/choropleths.html, we were able to make this interactive map widge. You can pan around, zoom, and hover your mouse over each state to see its clean enegery percentage and state name in a popup window. The state boundaries data came from https://www.sciencebase.gov/catalog/item/52c78623e4b060b9ebca5be5.

# read in the shapefile
states_shp <- st_read('../LiuSmootZhang_ENV872_EDA_FinalProject/data_raw/tl_2012_us_state.shp')
## Reading layer `tl_2012_us_state' from data source 
##   `/home/guest/R/LiuSmootZhang_ENV872_EDA_FinalProject/data_raw/tl_2012_us_state.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 56 features and 17 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -19951910 ymin: -1643352 xmax: 20021890 ymax: 11554790
## Projected CRS: Popular Visualisation CRS / Mercator
states_shp <- st_transform(states_shp, c=4326)

# join the dataframe with clean energy percentage by state to the state shapefile
join_states <- full_join(states_shp, final_df, by = c('NAME' = 'States'))

# visualize on map
bins <- c(NA, 0, 0.20, 0.40, 0.60, 0.80, 1.00)
pal2 <- colorBin("Greens", domain = join_states$Clean_percent, bins = bins)

labels <- sprintf(
  "<strong>%s</strong><br/>%g ",
  join_states$NAME, join_states$Clean_percent) %>% lapply(htmltools::HTML)

leaflet(join_states) %>%
  setView(-96, 37.8, 3) %>%
  addPolygons(
    fillColor = ~pal2(Clean_percent),
    weight = 2,
    opacity = 1,
    color = "white",
    dashArray = "3",
    fillOpacity = 0.7,
    highlightOptions = highlightOptions(
      weight = 5,
      color = "#666",
      dashArray = "",
      fillOpacity = 0.7,
      bringToFront = TRUE),
    label = labels,
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "3px 8px"),
      textsize = "15px",
      direction = "auto")) %>%
  addLegend(pal = pal2, values = ~Clean_percent, opacity = 0.7, title = NULL,
    position = "bottomright")

Analysis

Regression Analysis

Given the skewed distribution of certain variables, we use the log-transformed versions in the Generalized Linear Model (GLM). Out of the concern of over-parameterize a linear model, we use the Akaike’s Information Criterion (AIC) to compute a stepwise regression that removes explanatory variables from a full set of suggested options.

We feed all potential explanatory variables into the full model and use step() function to decide the optimal model. Based on the stepwise regression, we narrow down the explanatory variables to air quality, logged GDP, average IQ and Republican vote.

TPAIC <- lm(data = final_naomit, Clean_percent ~ Good_Air_Quality_Days_Percentage + log(Oil_Production) + log(GDP) + Sunshine + log(Electricity_Cost) + Average_IQ + Republican_vote)

#Choose a model by AIC in a Stepwise Algorithm
step(TPAIC)
## Start:  AIC=-108.61
## Clean_percent ~ Good_Air_Quality_Days_Percentage + log(Oil_Production) + 
##     log(GDP) + Sunshine + log(Electricity_Cost) + Average_IQ + 
##     Republican_vote
## 
##                                    Df Sum of Sq     RSS     AIC
## - Sunshine                          1  0.007797 0.56449 -110.18
## - log(Oil_Production)               1  0.009092 0.56578 -110.11
## - log(Electricity_Cost)             1  0.025375 0.58206 -109.23
## - Republican_vote                   1  0.035648 0.59234 -108.69
## <none>                                          0.55669 -108.61
## - Good_Air_Quality_Days_Percentage  1  0.039261 0.59595 -108.50
## - Average_IQ                        1  0.093106 0.64979 -105.82
## - log(GDP)                          1  0.157498 0.71419 -102.89
## 
## Step:  AIC=-110.18
## Clean_percent ~ Good_Air_Quality_Days_Percentage + log(Oil_Production) + 
##     log(GDP) + log(Electricity_Cost) + Average_IQ + Republican_vote
## 
##                                    Df Sum of Sq     RSS     AIC
## - log(Oil_Production)               1  0.008197 0.57268 -111.73
## - log(Electricity_Cost)             1  0.018398 0.58288 -111.19
## <none>                                          0.56449 -110.18
## - Republican_vote                   1  0.048844 0.61333 -109.61
## - Good_Air_Quality_Days_Percentage  1  0.074433 0.63892 -108.34
## - Average_IQ                        1  0.101302 0.66579 -107.06
## - log(GDP)                          1  0.159932 0.72442 -104.45
## 
## Step:  AIC=-111.73
## Clean_percent ~ Good_Air_Quality_Days_Percentage + log(GDP) + 
##     log(Electricity_Cost) + Average_IQ + Republican_vote
## 
##                                    Df Sum of Sq     RSS     AIC
## - log(Electricity_Cost)             1  0.017277 0.58996 -112.81
## <none>                                          0.57268 -111.73
## - Republican_vote                   1  0.054567 0.62725 -110.91
## - Good_Air_Quality_Days_Percentage  1  0.066666 0.63935 -110.32
## - Average_IQ                        1  0.097791 0.67047 -108.85
## - log(GDP)                          1  0.160578 0.73326 -106.07
## 
## Step:  AIC=-112.81
## Clean_percent ~ Good_Air_Quality_Days_Percentage + log(GDP) + 
##     Average_IQ + Republican_vote
## 
##                                    Df Sum of Sq     RSS     AIC
## <none>                                          0.58996 -112.81
## - Good_Air_Quality_Days_Percentage  1  0.075281 0.66524 -111.09
## - Average_IQ                        1  0.090956 0.68092 -110.37
## - Republican_vote                   1  0.097617 0.68758 -110.07
## - log(GDP)                          1  0.144389 0.73435 -108.03
## 
## Call:
## lm(formula = Clean_percent ~ Good_Air_Quality_Days_Percentage + 
##     log(GDP) + Average_IQ + Republican_vote, data = final_naomit)
## 
## Coefficients:
##                      (Intercept)  Good_Air_Quality_Days_Percentage  
##                         1.844051                         -0.007738  
##                         log(GDP)                        Average_IQ  
##                        -0.091895                          0.020877  
##                  Republican_vote  
##                        -0.009484
TPmodel <- lm(data = final_naomit, Clean_percent ~ Good_Air_Quality_Days_Percentage + log(GDP) + Average_IQ + Republican_vote)
summary(TPmodel)
## 
## Call:
## lm(formula = Clean_percent ~ Good_Air_Quality_Days_Percentage + 
##     log(GDP) + Average_IQ + Republican_vote, data = final_naomit)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.21917 -0.09577  0.00624  0.08894  0.35866 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                       1.844051   1.591740   1.159   0.2572  
## Good_Air_Quality_Days_Percentage -0.007738   0.004248  -1.821   0.0801 .
## log(GDP)                         -0.091895   0.036429  -2.523   0.0181 *
## Average_IQ                        0.020877   0.010428   2.002   0.0558 .
## Republican_vote                  -0.009484   0.004572  -2.074   0.0481 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1506 on 26 degrees of freedom
## Multiple R-squared:  0.3834, Adjusted R-squared:  0.2885 
## F-statistic: 4.041 on 4 and 26 DF,  p-value: 0.01115

With a degree of freedom of 26, our finalized model shows that the explanatory variables can account for 38.34% of the observed variance in response variable. This regression has a p-value of 0.01, indicating strong statistical significance. Among the explanatory variables, logged GDP and Republican vote are highly statistical significant (p-value < 0.05) while air quality and average IQ are less significant with p-values ranging from 0.05 to 0.1.

####Could we include this text for the regression analysis? JS#### [The coefficients table provides the estimated coefficients for each explanatory variable, which indicate the expected change in the response variable for a one-unit increase in the explanatory variable while holding other variables constant. For example, a one-unit increase in the log(GDP) variable is associated with a 0.0919 decrease in the Clean_percent variable while holding the other variables constant.

The p-values in the table indicate the level of statistical significance for each variable. A p-value less than the significance level of 0.05 suggests that the variable is statistically significant in explaining the variation in the response variable. In this model, log(GDP) and Republican_vote are statistically significant, while Average_IQ is marginally significant at the 0.05 level.

The Adjusted R-squared value of 0.2885 indicates that the model explains approximately 28.85% of the variation in the response variable after adjusting for the number of explanatory variables. The F-statistic of 4.041 with a p-value of 0.01115 suggests that at least one of the explanatory variables is significant in explaining the response variable. The Residual standard error of 0.1506 indicates the variation in the response variable that cannot be explained by the model.]

par(mfrow = c(2,2), mar=c(4,4,4,4))
plot(TPmodel)

par(mfrow = c(1,1))

Based on the above plots, there are no outliners that significantly impact the regression result.

T Test

party <- t.test(final_df$Clean_percent ~ final_df$party)
party
## 
##  Welch Two Sample t-test
## 
## data:  final_df$Clean_percent by final_df$party
## t = 1.0798, df = 47.659, p-value = 0.2857
## alternative hypothesis: true difference in means between group Blue and group Red is not equal to 0
## 95 percent confidence interval:
##  -0.05246162  0.17412949
## sample estimates:
## mean in group Blue  mean in group Red 
##          0.4071395          0.3463056

####I edited this ~JS Based on the T-test result, the clean energy adoption rate for Red and Blue states are not statistically different. The test statistic t is 1.0798 with degrees of freedom (df) of 47.659 and a p-value of 0.2857. The p-value is greater than the significance level of 0.05, so we fail to reject the null hypothesis. This means that we do not have enough evidence to conclude that there is a significant difference in the mean percentage of clean energy adoption between Blue and Red states.

ggplot(final_df, aes(y=Clean_percent, x=party, fill = party)) +
  geom_boxplot()+
  labs(title = "Clean Energy Adoption Rate by Red or Blue State", y="Clean Energy Percentage (%)", x="Party", fill ="Party")+
  scale_fill_manual(values=c("skyblue3", "firebrick3"))

Conclusion

In this study, we sought to investigate the relationship between various demographic, economic, and political factors with the adoption of clean energy in the US states. Our analysis of the data revealed some interesting findings that can help us better understand this relationship.

Our regression analysis showed that state GDP and Republican vote are significant predictors of the percentage of clean energy adoption in the state. Average IQ is a marginally significant predictor of the percentage of clean energy adoption in the state. Of these, GDP and Republican vote had a negative relationship with clean energy adoption, meaning that as these variables increased, the percentage of clean energy adoption decreased. In contrast, average IQ had a positive relationship with clean energy adoption, suggesting that states with higher average IQ tend to have a higher percentage of clean energy adoption.

While the negative relationship between GDP and clean energy adoption may seem counterintuitive, it may reflect the fact that high GDP states have a stronger presence of energy-intensive industries, which may be resistant to clean energy adoption. Additionally, the negative relationship with Republican vote suggests that political ideology may also play a role in clean energy adoption, with Republican-leaning states being less likely to adopt clean energy.

Our t-test revealed that there is no statistically significant difference between the mean percentage of clean energy adoption in Blue and Red states. While the mean percentage of clean energy adoption was higher in Blue states, this difference was not statistically significant. However, this finding should be interpreted with caution, as our analysis did not control for other factors that may influence clean energy adoption.

Overall, our findings suggest that demographic, economic, and political factors are all important predictors of clean energy adoption in the US states. States with better air quality and higher average IQ tend to have a higher percentage of clean energy adoption, while states with higher GDP and more Republican-leaning voters tend to have a lower percentage of clean energy adoption. These findings may have important implications for policymakers seeking to promote clean energy adoption in the US and can serve as a starting point for further research in this area.